# Section 9: Calling Conventions, Cache and TLB, Demand Paging

# CS162

# $March\ 16,\ 2016$

# Contents

|   |                         | 2 |
|---|-------------------------|---|
|   | 1.1 Calling Conventions | 2 |
|   | 1.2 Argument Passing    | 2 |
| 2 | Vocabulary              | 4 |
| _ | Problems                | 5 |
|   | 3.1 Caching             | 5 |
|   | 3.2 Demand Paging       |   |
|   | 3.3 Virtual Memory      | 7 |
|   |                         |   |

### 1 Calling Conventions and Argument Passing

#### 1.1 Calling Conventions

```
Sketch the stack frame of helper before it returns.
void helper(char* str, int len) {
  char word[len];
  strncpy(word, str, len);
  printf("%s", word);
  return;
}
int main(int argc, char *argv[]) {
  char* str = "Hello World!";
  helper(str, 13);
}
```

```
13
str
return address
saved EBP
!
d
1
...
1
e
H
```

#### 1.2 Argument Passing

Fill out the following code to copy integer arguments onto the stack. You can change the ESP by modifying if\_->esp.

```
void populate_stack_args(int argc, uint32_t *argv, struct intr_frame *if_) {
    // The stack pointer is available in if_->esp (which has the type uint32_t*)
    int i;

for (_______) {
    if_->esp = _____;
    ____ = argv[i];
}
```

```
void populate_stack_args(int argc, uint32_t* argv, struct intr_frame* if_) {
    // The stack pointer is available in if_->esp (which has the type uint32_t*)
    int i;
```

```
for (i = argc-1; i >= 0; i--) {
    if_->esp = if_->esp - 1;
    *if_->esp = argv[i];
}
```

### 2 Vocabulary

- Calling Conventions Describe the interface of called code. Define the order in which atomic parameters are allocated, how parameters are passed, which registers the callee must preserve for the caller, how the stack is prepared and restored, etc.
- %esp, %ebp In a stack frame (on x86 architecture), the %esp register holds the address of the last thing on the stack. The %esp pointer gets moved as things get added or removed from the stack. The %ebp register stores the position of the beginning of the current stack frame.
- Compulsory Miss The miss that occurs on the first reference to a block. There's essentially nothing that you can do about this type of miss, but over the course of time, compulsory misses become insignificant compared to all the other memory accesses that occur.
- Capacity Miss This miss occurs when the cache can't contain all the blocks that the program accesses. Usually the solution to capacity misses is to increase the cache size.
- Conflict Miss Conflict misses occur when multiple memory locations are mapped to the same cache location. In order to prevent conflict misses, you should either increase the cache size or increase the associativity of the cache.
- Coherence Miss Coherence misses are caused by external processors or I/O devices that updates
  what's in memory.
- Working set The subset of the address space that a process uses as it executes. Generally we can say that as the cache hit rate increases, more of the working set is being added to the cache.

#### 3 Problems

#### 3.1 Caching

An up-and-coming big data startup has just hired you do help design their new memory system for a byte addressable system. Suppose the virtual and physical memory address space is 32 bits with a 4KB page size.

First, you create 1) a direct mapped cache and 2) a fully associative cache of the same size that replaces the least recently used pages when the cache is full. You run a few tests and realize that the fully associative cache performs much worse than the direct mapped cache. What's a possible access pattern that could cause this to happen?

Let's say each cache held X amount of blocks. An access pattern would be to repeatedly iterate over X+1 consecutive blocks, which would cause everything in the fully associated cache to miss.

Instead, your boss tells you to build a 8KB 2-way set associative cache with 64 byte cache blocks. How would you split a given virtual address into its tag, index, and offset numbers?

Since it's two way set associative, the cache is split into two 4KB banks. The offset will take 6 bytes, since  $2^{64} = 6$ . Each bank can store 64 blocks, since  $2^{12}/2^6 = 2^6$ , so there will be 6 index bits. Then the rest of the bits will be for the tag. It will look like this:

| 20 Bits | 6 Bits | 6 Bits |
|---------|--------|--------|
| Tag     | Index  | Offset |

You finish building the cache, and you want to show your boss that there was a significant increase in average read time. Suppose your system uses a two level page table to translate virtual addresses and your system uses the cache for the translation tables and data. Each memory access takes 50ns, the cache lookup time is 5ns, and your cache hit rate is 90%. What is the average time to read a location from memory?

Since the page table has two levels, there are three reads for each access. For each access, the average access time is: 0.9\*5+0.1\*(5+50)=10ns. Since there are 3 accesses, we multiply this by 3 to get an average read time of 30ns.

#### 3.2 Demand Paging

Your boss has been so impressed with your work designing the caching that he has asked for your advice on designing a TLB to use for this system. Suppose you know that there will only be 4 process running at the same time, each with a maximum size of 512MB and a working set size of 256KB. Assuming the same system as the previous problem (32 bit virtual and physical address space, 4KB page size), what is the minimum amount of TLB entries that your system would need to support to be able to map the working set size for one process? What happens if you have more entries? What about less?

A process has a working set size of 256KB which means that the working set fits in 64 pages. This means our TLB should have 64 entries. If you have more entries, then performance will increase since the process often has changing working sets, and it should be able to store more in the TLB. If it has less, then it can't easily translate the addresses in the working set and performance will suffer.

Suppose you run some benchmarks on the system and you see that the system is utilizing over 99% of its paging disk IO capacity, but only 10% of its CPU. What is a combination of the of disk space and memory size that can cause this to occur? Assume you have TLB entries equal to the answer from the previous part.

The CPU can't run very often without having to wait for the disk, so it's very likely that the system is thrashing. There isn't enough memory for the benchmark to run without the system page faulting and having to page in new pages. Since there will be 4 processes that have a maximum size of 512MB each, this can occur as long as the memory size is under 2GB regardless of the number of TLB entries and disk size.

Out of increasing the size of the TLB, adding more disk space, and adding more memory, which one would lead to the largest performance increase and why?

We should add more memory so that we won't need to page in new pages as often.

#### 3.3 Virtual Memory

vmstat is a Linux performance debugging tool that provides information about virtual memory on your system. When you run it, the output looks like this:

#### \$ vmstat 1 procs ------memory------ ---swap-- ---io--- -system-- ----cpu---swpd free buff cache si bi bo in cs us sy id wa st SO 0 174184 1007372 96316 642 3095 678 123 128 0 1 99 0 0 49 0 48 0 174240 1007372 96316 0 0 0 88 0 0 100 0 0 0 0 0 174240 1007372 96316 0 0 0 0 33 75 0 0 100 0 0 0 0 174240 1007372 96316 0 0 0 0 32 73 0 0 100

The 1 means "recompute the stats every 1 second and print them out". The first line contains the average values since boot time, while the second line contains the averages of the last second (current

```
averages). Here's a reference for what each one of the columns means.
   Procs
       r: The number of runnable processes (running or waiting for run time).
       b: The number of processes in uninterruptible sleep.
   Memory
       swpd: the amount of virtual memory used.
       free: the amount of idle memory.
       buff: the amount of memory used as buffers.
       cache: the amount of memory used as cache.
       inact: the amount of inactive memory. (-a option)
       active: the amount of active memory. (-a option)
   Swap
       si: Amount of memory swapped in from disk (/s).
       so: Amount of memory swapped to disk (/s).
   ΙO
       bi: Blocks received from a block device (blocks/s).
       bo: Blocks sent to a block device (blocks/s).
   System
       in: The number of interrupts per second, including the clock.
       cs: The number of context switches per second.
   CPU
       These are percentages of total CPU time.
       us: Time spent running non-kernel code.
                                                 (user time, including nice time)
       sy: Time spent running kernel code. (system time)
       id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
       wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
       st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.
```

Take a look at these 3 programs (A, B, C).

```
char *buffer[4 * (1L << 20)];</pre>
int A(int in) {
 // "in" is a file descriptor for a file on disk
 while (read(in, buffer, sizeof(buffer)) > 0);
int B() {
   size_t size = 5 * (1L << 30);
   int *x = malloc(size);
   memset(x, 0xCC, size);
}
sem_t sema;
pthread_t thread;
void *foo() { while (1) sem_wait(&sema); }
int C() {
   pthread_create(&thread, NULL, foo, NULL);
    while (1) sem_post(&sema);
}
```

I ran these 3 programs one at a time, but in a random order. What order did I run them in? Can you tell where (in the vmstat output) one program stopped and another started? Explain.

```
procs ------memory------ ---swap-- ----io---- -system-- -----cpu-----
r
   b
       swpd
              free
                     buff cache
                                   si
                                              bi
                                                    bo
                                                         in
                                                               cs us sy id wa st
                                         so
0
   0 684688
             25216 1822136
                            60860
                                    75
                                        748
                                             3645
                                                    779 146
                                                              296 1 1 98 0
   0 684688
             25268 1822136
                            60868
                                     0
                                          0
                                                0
                                                      0 18150 735898 6 44 51
1
                                                                               0
                                          0
   0 684688
             25268 1822136
                             60868
                                     0
                                                0
                                                      0 61864 1270088
                                                                       6 77 17
                                                                                0
                                          0
   0 684688
             25268 1822136
                            60868
                                     0
                                                0
                                                      0 59497 1102825
                                                                      8 71 21
1
1
   0 684688
             25268 1822136
                             60868
                                     0
                                          0
                                                0
                                                      0 94619 766431 11 79 10
   0 684688
             25612 1822136
                            60868
                                     0
                                          0
                                                0
                                                       0 13605 237430
                                                                     2 13 85
   0 684688
             25612 1822136
                            60868
                                          0
                                                0
                                                           61 115
                                                                     1 100
3
   0 694520 18544
                      3212 45040
                                   64 11036
                                               264 11144 2647 2339
                                                                   5 51 43 0
   1 1285828 20560
                       128
                              580
                                    88 592440 14248 592440 18289 2171
                                                                       3 58 36
3
   0 1866176
              21492
                       128
                              2132
                                     0 578404 8972 578404 47646 1691
                                                                       2 70 28
                                     0 487732 11708 487788 17404 1881
3
   0 2350636
              17820
                       136
                             2640
                                                                       1 58 39
2
   0 2771016 22168
                       544
                              4360 2072 417272 15372 417272 17460 2192
                                                                       2 57
                                                                                 3
0
   0 697036 1922160
                       560
                             9712 1516 418224 16508 418228 47747 2616
                                                                       0 64 30
                                               288
   0 697032 1921696
                       564
                            10096
                                    28
                                          0
                                                         77 148 0 0 100
   0 696980 878128 1037720
                                   412
                                          0 1038840
                                                        0 11128 14854
                                                                       1 25 54 21
1
                             11272
   0 696980
             21732 1895476
                                     0
                                          0 1286460
                                                        0 13610 18224
                                                                       0 31 46 22
1
                             9348
0
   2 696980
             20992 1896496
                             9072
                                     0
                                          0 1297536
                                                       20 13745 19164
                                                                       0 36 43 21
   1 696980
             20228 1897784
                              8648
                                     0
                                          0 1283324
                                                       32 13659 18931
                                                                       0 36 41 23
   1 696960
             21048 1897404
                              8716
                                     48
                                          0 1215152
                                                        0 12601 17672
                                                                       0 34 45 21
   0 696952
             23048 1899112
                              9004
                                     8
                                          0 470112
                                                       0 5100 7073 0 13 81
   0 696952
             23048 1899112
                              9004
                                                0
                                                          45
                                                                89 0 0 100
```

I ran C (rows 2-6), B (rows 8-13), then A (rows 15-20). Program C has a lot of context switches (cs), but no swap or IO activity. Both program A and B have a lot of disk I/O, but program A is read-only IO (bi) and program B is the only one that should be swapping to disk (so).

If you have extra available physical memory, Linux will use it to cache files on disk for performance benefits. This disk cache may also include parts of the swapfile. Why would caching the swapfile be better than paging-in those pages immediately?

If the system's memory needs grow, the disk cache must shrink. If we avoid paging-in those cached parts of the swapfile, we can immediately evict the swapfile from memory without needing to modify any page tables.

If I remove the line "memset(x, 0xCC, size);" from program B, I notice that the **vmstat** output does not have a spike in swap (si and so) nor in io (bi and bo). My system doesn't have enough physical memory for a 5GB array. Yet, the array is not swapped out to disk. Where does the array go? Why did the memset make a difference?

The memory returned by malloc is sparsely allocated. It won't actually be allocated until I begin using it. The memset will force the kernel to materialize those pages, because I write to them.

Program B has a 5GB array, but the whole thing just contains 0xccccccc. Based on this observation, can you think of a way to reduce program B's memory footprint without changing any of program B's code? (What can the kernel do to save memory?)

The kernel could compress pages in memory. It's usually faster to uncompress memory than it is to page-in from disk. If the entire array is the same byte repeated over and over, then the kernel could achieve very good compression ratios on the array. The kernel must mark infrequently-used pages as invalid, so that it has a change to uncompress those pages inside the page fault exception handler.